Non-Stationary Approximate Modified Policy Iteration

نویسندگان

Boris Lesner

Bruno Scherrer

چکیده

We consider the infinite-horizon γ-discounted optimal control problem formalized by Markov Decision Processes. Running any instance of Modified Policy Iteration—a family of algorithms that can interpolate between Value and Policy Iteration—with an error at each iteration is known to lead to stationary policies that are at least 2γ (1−γ)2 -optimal. Variations of Value and Policy Iteration, that build `-periodic nonstationary policies, have recently been shown to display a better 2γ (1−γ)(1−γ`) -optimality guarantee. We describe a new algorithmic scheme, Non-Stationary Modified Policy Iteration, a family of algorithms parameterized by two integers m ≥ 0 and ` ≥ 1 that generalizes all the above mentionned algorithms. While m allows one to interpolate between Value-Iteration-style and Policy-Iteration-style updates, ` specifies the period of the non-stationary policy that is output. We show that this new family of algorithms also enjoys the improved 2γ (1−γ)(1−γ`) -optimality guarantee. Perhaps more importantly, we show, by exhibiting an original problem instance, that this guarantee is tight for all m and `; this tightness was to our knowledge only known in two specific cases, Value Iteration (m = 0, ` = 1) and Policy Iteration (m =∞, ` = 1).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Tight Performance Bounds for Approximate Modified Policy Iteration with Non-Stationary Policies

We consider approximate dynamic programming for the infinite-horizon stationary γ-discounted optimal control problem formalized by Markov Decision Processes. While in the exact case it is known that there always exists an optimal policy that is stationary, we show that when using value function approximation, looking for a non-stationary policy may lead to a better performance guarantee. We def...

متن کامل

On Approximate Stationary Radial Solutions for a Class of Boundary Value Problems Arising in Epitaxial Growth Theory

In this paper, we consider a non-self-adjoint, singular, nonlinear fourth order boundary value problem which arises in the theory of epitaxial growth. It is possible to reduce the fourth order equation to a singular boundary value problem of second order given by w''-1/r w'=w^2/(2r^2 )+1/2 λ r^2. The problem depends on the parameter λ and admits multiple solutions. Therefore, it is difficult to...

متن کامل

On the Performance Bounds of some Policy Search Dynamic Programming Algorithms

We consider the infinite-horizon discounted optimal control problem formalized by Markov Decision Processes. We focus on Policy Search algorithms, that compute an approximately optimal policy by following the standard Policy Iteration (PI) scheme via an -approximate greedy operator (Kakade and Langford, 2002; Lazaric et al., 2010). We describe existing and a few new performance bounds for Direc...

متن کامل

Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games

This paper provides an analysis of error propagation in Approximate Dynamic Programming applied to zero-sum two-player Stochastic Games. We provide a novel and unified error propagation analysis in Lp-norm of three well-known algorithms adapted to Stochastic Games (namely Approximate Value Iteration, Approximate Policy Iteration and Approximate Generalized Policy Iteratio,n). We show that we ca...

متن کامل

Solving time-fractional chemical engineering equations by modified variational iteration method as fixed point iteration method

The variational iteration method(VIM) was extended to find approximate solutions of fractional chemical engineering equations. The Lagrange multipliers of the VIM were not identified explicitly. In this paper we improve the VIM by using concept of fixed point iteration method. Then this method was implemented for solving system of the time fractional chemical engineering equations. The ob...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

Non-Stationary Approximate Modified Policy Iteration

نویسندگان

چکیده

منابع مشابه

Tight Performance Bounds for Approximate Modified Policy Iteration with Non-Stationary Policies

On Approximate Stationary Radial Solutions for a Class of Boundary Value Problems Arising in Epitaxial Growth Theory

On the Performance Bounds of some Policy Search Dynamic Programming Algorithms

Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games

Solving time-fractional chemical engineering equations by modified variational iteration method as fixed point iteration method

عنوان ژورنال:

اشتراک گذاری